Variable (programming)

In computer programming, a variable is a symbolic name given to some known or unknown quantity or value, for the purpose of allowing the name to be used independently of the value it represents. A variable name in computer source code is associated with a data storage location --and thus its contents, which generally change during the course of program execution.

Variables in programming do not directly correspond to the principle of variables in mathematics. The closest mathematical concepts are substitution and evaluation. A variable in mathematics often represents an entire domain or set of values that satisfy some abstract criteria, whereas a computer variable represents only one actual value at any given moment.

The value of a computing variable is not necessarily derived by an equation or formula as it is in mathematics. In computing, a variable is typically employed in a repetitive process: assigned a value in one place, then used elsewhere, then reassigned a new value and used again in the same way. (See iteration.) Variables in computer logic are frequently given long names to make them relatively descriptive of their use, whereas variables in mathematics have terse, one- or two-character names for brevity in transcription and manipulation.

A computer variable can represent any kind of data that can be stored in a computer system, from simple True/False conditions to numbers, names, pictures, sounds, large segments of audio/video or arrays of information describing multi-dimensional entities and their behavior (e.g., within computer models). Classic examples of variables in business computing represent numerical quantities such as monetary amounts, counters, totals and subtotals, or alphanumeric fields containing names of persons, places, products or other descriptive text.

A variable has three essential attributes: a symbolic name (also known as an identifier), a data location (generally in storage or memory, comprised of address and length), and the value, represented by the data contents of that location. These attributes are often assigned at separate times during the program execution.

Variables often also have a fourth attribute, a type or class which specifies the kind of information the variable stores. Variable type affects the format in which its value is to be stored, the amount of memory it occupies and the way its contents are to be manipulated, interpreted and expressed. A variable can be classed as numeric or non-numeric, or defined in any of several subtypes (e.g., integer, decimal, currency, scientific, string, text, boolean, etc.) and in various ranks and sizes containing singular or multiple values, depending on what is defined and supported by the programming language.

While the variable name, type and location generally remain fixed, the data stored in the location can be altered during program run (execution), causing its value to change or vary --hence the name. The current value of the variable is the datum actually stored in the memory location associated with it.

When the program is made ready for execution, the location of the value is substituted for the name wherever it is referenced in the procedural portion of the program source code. At run time, the contents of that location are used in place of the symbolic name for calculations and other data manipulation.

A variable is generally referenced by a symbolic name, hereinafter referred to as the "identifier". In contemporary programming languages, non-identical identifiers can refer to the same variable, its location or contents.

Contents

Identifiers referencing a variable

An identifier referencing a variable can be used to access the variable in order to read out the value, or alter the value, or modify the attributes of the variable, such as access permission, locks, semaphores, etc.

For instance, a variable might be referenced by the identifier "total_count" and the variable can contain the number 1956. If the same variable is referenced by the identifier "x" as well, and if using this identifier "x", the value of the variable is altered to 2009, then reading the value using the identifier "total_count", the result is 2009 and not 1956.

If in a particular programming language a variable can only be referenced by a single identifier, that can simply be called the name of the variable. Otherwise we can talk about one of the names of the variable. For instance, in the previous example, the "total_count" is a name of the variable in question, and "x" is another name of the same variable.

Actions on a variable

In imperative programming languages, values can generally be accessed or changed at any time. However, in pure functional and logic languages, variables are bound to expressions and keep a single value during their entire lifetime due to the requirements of referential transparency. In imperative languages, the same behavior is exhibited by constants, which are typically contrasted with normal variables.

Depending on the type system of a programming language, variables may only be able to store a specified datatype (e.g. integer or string). Alternatively a datatype may be associated only with the current value, allowing a single variable to store anything supported by the programming language.

Naming conventions

Unlike their mathematical counterparts, programming variables and constants commonly take multiple-character names, e.g. COST or total. Single-character names are most commonly used only for auxiliary variables; for instance, i, j, k for array index variables.

Some naming conventions are enforced at the language level as part of the language syntax and involve the format of valid identifiers. In almost all languages, variable names cannot start with a digit (0-9) and cannot contain whitespace characters. Whether, which, and when punctuation marks are permitted in variable names varies from language to language; many languages only permit the underscore (_) in variable names and forbid all other punctuation. In some programming languages, specific (often punctuation) characters (known as sigils) are prefixed or appended to variable identifiers to indicate the variable's type.

Case-sensitivity of variable names also varies between languages and some languages require the use of a certain case in naming certain entities;[note 1] Most modern languages are case-sensitive, some older languages are not. Some languages reserve certain forms of variable names for their own internal use; in many languages, names beginning with 2 underscores ("__") often fall under this category.

However, beyond the basic restrictions imposed by a language, the naming of variables is largely a matter of style. At the machine code level, variable names are not used, so the exact names chosen do not matter to the computer. Thus names of variables identify them, for the rest they are just a tool for programmers to make programs easier to write and understand.

Programmers often create and adhere to code style guidelines which offer guidance on naming variables or impose a precise naming scheme. Shorter names are faster to type but are less descriptive; longer names often make programs easier to read and the purpose of variables easier to understand. However, extreme verbosity in variable names can also lead to less comprehensible code.

In spreadsheets

In a spreadsheet, a cell may contain a formula with references to other cells. Such a cell reference is a kind of variable; its value is the value of the referenced cell (see also: reference (computer science)).

Scope and extent

The scope of a variable describes where in a program's text, the variable may be used, while the extent (or lifetime) describes when in a program's execution a variable has a (meaningful) value. The scope of a variable is actually a property of the name of the variable, and the extent is a property of the variable itself.

A variable name's scope affects its extent.

Scope is a lexical aspect of a variable. Most languages define a specific scope for each variable (as well as any other named entity), which may differ within a given program. The scope of a variable is the portion of the program code for which the variable's name has meaning and for which the variable is said to be "visible". Entrance into that scope typically begins a variable's lifetime and exit from that scope typically ends its lifetime. For instance, a variable with "lexical scope" is meaningful only within a certain block of statements or subroutine. Variables only accessible within a certain functions are termed "local variables". A "global variable", or one with indefinite scope, may be referred to anywhere in the program.

Extent, on the other hand, is a runtime (dynamic) aspect of a variable. Each binding of a variable to a value can have its own extent at runtime. The extent of the binding is the portion of the program's execution time during which the variable continues to refer to the same value or memory location. A running program may enter and leave a given extent many times, as in the case of a closure.

In portions of code, a variable in scope may never have been given a value, or its value may have been destroyed. Such variables are described as "out of extent" or "unbound". In many languages, it is an error to try to use the value of a variable when it is out of extent. In other languages, doing so may yield unpredictable results. Such a variable may, however, be assigned a new value, which gives it a new extent. By contrast, it is permissible for a variable binding to extend beyond its scope, as occurs in Lisp closures and C static local variables. When execution passes back into the variable's scope, the variable may once again be used.

For space efficiency, a memory space needed for a variable may be allocated only when the variable is first used and freed when it is no longer needed. A variable is only needed when it is in scope, but beginning each variable's lifetime when it enters scope may give space to unused variables. To avoid wasting such space, compilers often warn programmers if a variable is declared but not used.

It is considered good programming practice to make the scope of variables as narrow as feasible so that different parts of a program do not accidentally interact with each other by modifying each other's variables. Doing so also prevents action at a distance. Common techniques for doing so are to have different sections of a program use different namespaces, or to make individual variables "private" through either dynamic variable scoping or lexical variable scoping.

Many programming languages employ a reserved value (often named null or nil) to indicate an invalid or uninitialized variable.

Typing

In statically-typed languages such as Java or ML, a variable also has a type, meaning that only values of a given class (or set of classes) can be stored in it. A variable of a primitive type holds a value of that exact primitive type. A variable of a class type can hold a null reference or a reference to an object whose type is that class type or any subclass of that class type. A variable of an interface type can hold a null reference or a reference to an instance of any class that implements the interface. A variable of an array type can hold a null reference or a reference to an array.

In dynamically-typed languages such as Python, it is values, not variables, which carry type. In Common Lisp, both situations exist simultaneously: a variable is given a type (if undeclared, it is assumed to be T, the universal supertype) which exists at compile time. Values also have types, which can be checked and queried at runtime.

Typing of variables also allows polymorphisms to be resolved at compile time. However, this is different from the polymorphism used in object-oriented function calls (referred to as virtual functions in C++) which resolves the call based on the value type as opposed to the supertypes the variable is allowed to have.

Variables often store simple data-like integers and literal strings, but some programming languages allow a variable to store values of other datatypes as well. Such languages may also enable functions to be parametric polymorphic. These functions operate like variables to represent data of multiple types. For example, a function named length may determine the length of a list. Such a length function may be parametric polymorphic by including a type variable in its type signature, since the amount of elements in the list is independent of the elements' types.

Parameters

The formal parameters of functions are also referred to as variables. For instance, in this Python code segment,

 def addtwo(x):
    return x + 2
 
 addtwo(5)  # yields 7

The variable named x is a parameter because it is given a value when the function is called. The integer 5 is the argument which gives x its value. In most languages, function parameters have local scope . This specific variable named x can only be referred to within the addtwo function (though of course other functions can also have variables called x).

Memory allocation

The specifics of variable allocation and the representation of their values vary widely, both among programming languages and among implementations of a given language. Many language implementations allocate space for local variables, whose extent lasts for a single function call on the call stack, and whose memory is automatically reclaimed when the function returns. (More generally, in name binding, the name of a variable is bound to the address of some particular block (contiguous sequence) of bytes in memory, and operations on the variable manipulate that block. Referencing is more common for variables whose values have large or unknown sizes when the code is compiled. Such variables reference the location of the value instead of the storing value itself, which is allocated from a pool of memory called the heap.

Bound variables have values. A value, however, is an abstraction, an idea; in implementation, a value is represented by some data object, which is stored somewhere in computer memory. The program, or the runtime environment, must set aside memory for each data object and, since memory is finite, ensure that this memory is yielded for reuse when the object is no longer needed to represent some variable's value.

Objects allocated from the heap must be reclaimed—especially when the objects are no longer needed. In a garbage-collected language (such as C#, Java, and Lisp), the runtime environment automatically reclaims objects when extant variables can no longer refer to them. In non-garbage-collected languages, such as C, the program (and the programmer) must explicitly allocate memory, and then later free it, to reclaim its memory. Failure to do so leads to memory leaks, in which the heap is depleted as the program runs, risking eventual failure from exhausting available memory.

When a variable refers to a data structure created dynamically, some of its components may be only indirectly accessed through the variable. In such circumstances, garbage collectors (or analogous program features in languages that lack garbage collectors) must deal with a case where only a portion of the memory reachable from the variable needs to be reclaimed.

Interpolation

Variable interpolation (also variable substitution, variable expansion) is the process of evaluating an expression or string literal containing one or more variables, yielding a result in which the variables are replaced with their corresponding values in memory. It is a specialized instance of concatenation.

Languages that support variable interpolation include Perl, PHP, Ruby, and most Unix shells. In these languages, variable interpolation only occurs when the string literal is double-quoted, but not when it is single-quoted. The variables are recognized because variables start with a sigil (typically "$") in these languages.

For example, the following Perl code (which would work identically in PHP):

$name = "Nakul";            
print "${name} said Hello World to the crowd of people.";

produces the output:

Nakul said Hello World to the crowd of people.

Ruby uses the "#" symbol for interpolation, and lets you interpolate any expression, not just variables. Other languages may support more advanced interpolation with a special formatting function, such as printf, in which the first argument, the format, specifies the pattern in which the remaining arguments are substituted.

Notes

  1. ↑ For example, Haskell requires that names of types start with a capital letter.